Improving the Multilingual User Experience of Wikipedia Using Cross-Language Name Search

نویسندگان

  • Raghavendra Udupa
  • Mitesh M. Khapra
چکیده

Although Wikipedia has emerged as a powerful collaborative Encyclopedia on the Web, it is only partially multilingual as most of the content is in English and a small number of other languages. In real-life scenarios, nonEnglish users in general and ESL/EFL 1 users in particular, have a need to search for relevant English Wikipedia articles as no relevant articles are available in their language. The multilingual experience of such users can be significantly improved if they could express their information need in their native language while searching for English Wikipedia articles. In this paper, we propose a novel crosslanguage name search algorithm and employ it for searching English Wikipedia articles in a diverse set of languages including Hebrew, Hindi, Russian, Kannada, Bangla and Tamil. Our empirical study shows that the multilingual experience of users is significantly improved by our approach.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Advertising Keyword Suggestion Using Relevance-Based Language Models from Wikipedia Rich Articles

When emerging technologies such as Search Engine Marketing (SEM) face tasks that require human level intelligence, it is inevitable to use the knowledge repositories to endow the machine with the breadth of knowledge available to humans. Keyword suggestion for search engine advertising is an important problem for sponsored search and SEM that requires a goldmine repository of knowledge. A recen...

متن کامل

Exploring the Effects of Language Skills on Multilingual Web Search

Multilingual access is an important area of research, especially given the growth in multilingual users of online resources. A large body of research exists for Cross-Language Information Retrieval (CLIR); however, little of this work has considered the language skills of the end user, a critical factor in providing effective multilingual search functionality. In this paper we describe an exper...

متن کامل

Document Categorization using Multilingual Associative Networks based on Wikipedia

Associative networks are a connectionist language model with the ability to categorize large sets of documents. In this research we combine monolingual associative networks based on Wikipedia to create a larger, multilingual associative network, using the cross-lingual connections between Wikipedia articles. We prove that such multilingual associative networks perform better than monolingual as...

متن کامل

Towards Supporting Exploratory Search over the Arabic Web Content: The Case of ArabXplore

Due to the huge amount of data published on the Web, the Web search process has become more difficult, and it is sometimes hard to get the expected results, especially when the users are less certain about their information needs. Several efforts have been proposed to support exploratory search on the web by using query expansion, faceted search, or supplementary information extracted from exte...

متن کامل

Classifying Wikipedia Articles into NE's Using SVM's with Threshold Adjustment

In this paper, a method is presented to recognize multilingual Wikipedia named entity articles. This method classifies multilingual Wikipedia articles using a variety of structured and unstructured features and is aided by cross-language links and features in Wikipedia. Adding multilingual features helps boost classification accuracy and is shown to effectively classify multilingual pages in a ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010